AITopics | spectral envelope

Collaborating Authors

spectral envelope

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Quartered Chirp Spectral Envelope for Whispered vs Normal Speech Classification

Joysingh, S. Johanan, Vijayalakshmi, P., Nagarajan, T.

arXiv.org Artificial IntelligenceAug-27-2024

Whispered speech as an acceptable form of human-computer interaction is gaining traction. Systems that address multiple modes of speech require a robust front-end speech classifier. Performance of whispered vs normal speech classification drops in the presence of additive white Gaussian noise, since normal speech takes on some of the characteristics of whispered speech. In this work, we propose a new feature named the quartered chirp spectral envelope, a combination of the chirp spectrum and the quartered spectral envelope, to classify whispered and normal speech. The chirp spectrum can be fine-tuned to obtain customized features for a given task, and the quartered spectral envelope has been proven to work especially well for the current task. The feature is trained on a one dimensional convolutional neural network, that captures the trends in the spectral envelope. The proposed system performs better than the state of the art, in the presence of white noise.

classification, spectral envelope, speech, (12 more...)

arXiv.org Artificial Intelligence

2408.14777

Country:

Asia > India > Tamil Nadu > Chennai (0.05)
North America > United States > South Carolina > Charleston County > Charleston (0.04)
North America > United States > Illinois (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.74)

Add feedback

Quartered Spectral Envelope and 1D-CNN-based Classification of Normally Phonated and Whispered Speech

Joysingh, S. Johanan, Vijayalakshmi, P., Nagarajan, T.

arXiv.org Artificial IntelligenceAug-25-2024

Whisper, as a form of speech, is not sufficiently addressed by mainstream speech applications. This is due to the fact that systems built for normal speech do not work as expected for whispered speech. A first step to building a speech application that is inclusive of whispered speech, is the successful classification of whispered speech and normal speech. Such a front-end classification system is expected to have high accuracy and low computational overhead, which is the scope of this paper. One of the characteristics of whispered speech is the absence of the fundamental frequency (or pitch), and hence the pitch harmonics as well. The presence of the pitch and pitch harmonics in normal speech, and its absence in whispered speech, is evident in the spectral envelope of the Fourier transform. We observe that this characteristic is predominant in the first quarter of the spectrum, and exploit the same as a feature. We propose the use of one dimensional convolutional neural networks (1D-CNN) to capture these features from the quartered spectral envelope (QSE). The system yields an accuracy of 99.31% when trained and tested on the wTIMIT dataset, and 100% on the CHAINS dataset. The proposed feature is compared with Mel frequency cepstral coefficients (MFCC), a staple in the speech domain. The proposed classification system is also compared with the state-of-the-art system based on log-filterbank energy (LFBE) features trained on long short-term memory (LSTM) network. The proposed system based on 1D-CNN performs better than, or as good as, the state-of-the-art across multiple experiments. It also converges sooner, with lesser computational overhead. Finally, the proposed system is evaluated under the presence of white noise at various signal-to-noise ratios and found to be robust.

1d-cnn-based classification, pitch harmonic, speech, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/s00034-022-02263-5

2408.13746

Country:

Asia > India > Tamil Nadu > Chennai (0.04)
North America > United States > South Carolina > Charleston County > Charleston (0.04)
North America > United States > Illinois (0.04)
Asia > Singapore (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SpecDiff-GAN: A Spectrally-Shaped Noise Diffusion GAN for Speech and Music Synthesis

Baoueb, Teysir, Liu, Haocheng, Fontaine, Mathieu, Roux, Jonathan Le, Richard, Gael

arXiv.org Artificial IntelligenceJan-30-2024

Generative adversarial network (GAN) models can synthesize highquality audio signals while ensuring fast sample generation. However, they are difficult to train and are prone to several issues including mode collapse and divergence. In this paper, we introduce SpecDiff-GAN, a neural vocoder based on HiFi-GAN, which was initially devised for speech synthesis from mel spectrogram. In our model, the training stability is enhanced by means of a forward diffusion process which consists in injecting noise from a Gaussian distribution to both real and fake samples before inputting them to the discriminator. We further improve the model by exploiting a spectrally-shaped noise distribution with the aim to make the discriminator's task more challenging. We then show the merits of our proposed model for speech and music synthesis on several datasets. Our experiments confirm that our model compares favorably in audio quality and efficiency compared to several baselines.

diffusion process, discriminator, synthesis, (13 more...)

arXiv.org Artificial Intelligence

2402.01753

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Differentiable WORLD Synthesizer-based Neural Vocoder With Application To End-To-End Audio Style Transfer

Nercessian, Shahan

arXiv.org Artificial IntelligenceMay-8-2023

In this paper, we propose a differentiable WORLD synthesizer and demonstrate its use in end-to-end audio style transfer tasks such as (singing) voice conversion and the DDSP timbre transfer task. Accordingly, our baseline differentiable synthesizer has no model parameters, yet it yields adequate synthesis quality. We can extend the baseline synthesizer by appending lightweight black-box postnets which apply further processing to the baseline output in order to improve fidelity. An alternative differentiable approach considers extraction of the source excitation spectrum directly, which can improve naturalness albeit for a narrower class of style transfer applications. The acoustic feature parameterization used by our approaches has the added benefit that it naturally disentangles pitch and timbral information so that they can be modeled separately. Moreover, as there exists a robust means of estimating these acoustic features from monophonic audio sources, it allows for parameter loss terms to be added to an end-to-end objective function, which can help convergence and/or further stabilize (adversarial) training.

artificial intelligence, machine learning, synthesizer, (19 more...)

arXiv.org Artificial Intelligence

2208.07282

Genre: Research Report (0.40)

Industry:

Media > Music (0.46)
Leisure & Entertainment (0.46)
Transportation (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive Noise Spectral Shaping

Koizumi, Yuma, Zen, Heiga, Yatabe, Kohei, Chen, Nanxin, Bacchiani, Michiel

arXiv.org Artificial IntelligenceAug-4-2022

Neural vocoder using denoising diffusion probabilistic model (DDPM) has been improved by adaptation of the diffusion noise distribution to given acoustic features. In this study, we propose SpecGrad that adapts the diffusion noise so that its time-varying spectral envelope becomes close to the conditioning log-mel spectrogram. This adaptation by time-varying filtering improves the sound quality especially in the high-frequency bands. It is processed in the time-frequency domain to keep the computational cost almost the same as the conventional DDPM-based neural vocoders. Experimental results showed that SpecGrad generates higher-fidelity speech waveform than conventional DDPM-based neural vocoders in both analysis-synthesis and speech enhancement scenarios. Audio demos are available at wavegrad.github.io/specgrad/.

specgrad, spectrogram, vocoder, (14 more...)

arXiv.org Artificial Intelligence

2203.16749

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.61)

Add feedback